You are viewing the RapidMiner Studio documentation for version 10.2 - Check here for latest version
 HTML to XML
						(Text Processing)
HTML to XML
						(Text Processing)
					
		
		Synopsis
This operator converts a HTML document into an XML/XHTML document.Description
The HTML to XML operator takes a document in the HTML-Format and parses it into strict XHTML, removing things as non-closed stand-alone tags and so on. This can be useful, if an XHTML document is required, or it's necessary that the document is fully valid.
Input
 document document- The HTML-document that should be transformed. 
Output
 document document- The XHTML-Document. 
Tutorial Processes
Replace invalid HTML tags
In this example, we first generate an HTML document, which contains a lot of non-XHTML-conform Tags, like a non-closed li, non-closed stand-alone tags and <H1> instead of <h1>.
So we pass on this document into the HTML to XML operator.
When we now open the results, we'll see that the operator has replaced all invalid tags by their valid representations.
